A Machine Learning Approach to Relational Noun Mining in German
نویسنده
چکیده
In this paper I argue in favour of a collocation extraction approach to the acquisition of relational nouns in German. We annotated frequency-based best lists of nounpreposition bigrams and subsequently trained different classifiers using (combinations of) association metrics, achieving a maximum Fmeasure of 69.7 on a support vector machine (Platt, 1998). Trading precision for recall, we could achieve over 90% recall for relational noun extraction, while still halving the annotation effort. 1 Mining relational nouns: almost a MWE extraction problem A substantial minority of German nouns are characterised by having an internal argument structure that can be expressed as syntactic complements. A non-negligeable number of relational nouns are deverbal, inheriting the semantic argument structure of the verbs they derive from. In contrast to verbs, however, complements of nouns are almost exclusively optional. The identification of relational nouns is of great importance for a variety of content-oriented applications: first, precise HPSG parsing for German cannot really be achieved, if a high number of noun complements is systematically analysed as modifiers. Second, recent extension of Semantic Role Labeling to the argument structure of nouns (Meyers et al., 2004) increases the interest in lexicographic methods for the extraction of noun subcategorisation information. Third, relational nouns are also a valuable resource for machine translation, separating the more semantic task of translating modifying prepositions from the more syntactic task of translating subcategorised for prepositions. Despite its relevance for accurate deep parsing, the German HPSG grammar developed at DFKI (Müller and Kasper, 2000; Crysmann, 2003; Crysmann, 2005) currently only includes 107 entries for proposition taking nouns, and lacks entries for PP-taking nouns entirely. In terms of subcategorisation properties, relational nouns in German can be divided up into 3 classes: • nouns taking genitival complements (e.g., Beginn der Vorlesung ‘beginning of the lecture’, Zerstörung der Stadt ‘destruction of the city’ ) • nouns taking propositional complements, either a complementiser-introduced finite clause (der Glaube, daß die Erde flach ist ‘the belief that earth is flat’), or an infinitival clause (die Hoffnung, im Lotto zu gewinnen ‘the hope to win the lottery’), or both • nouns taking PP complements In this paper, I will be concerned with nouns taking prepositional complements, although the method described here can also be easily applied to the case of complementiser-introduced propositional complements.1 In fact, I expect the task of mining relational nouns taking finite propositional complements to be far easier, owing to a reduced ambiguity of the still relatively local complementiser
منابع مشابه
Adding Data Mining Support to SPARQL Via Statistical Relational Learning Methods
In machine learning/data mining research, people have been exploring how to learn models of relational data for a long time. The rational behind this is that exploiting the complex structure of relational data enables to build better models by taking into account the additional information provided by the links between objects. These links are usually hard to model by traditional propositional ...
متن کاملIdentifying Features from Opinion Mining Using Fine-Grained Relational Topic Weighted Approach
-Opinion feature extraction is a sub problem of opinion mining analyzed at document, sentence, or even phrase (word) levels. Document-level (sentence-level) opinion mining is classified as overall subjectivity or sentiment, expressed in an individual review document. The existing approaches to opinion feature extraction depended on mining patterns from a particular evaluate corpus disregard non...
متن کاملSupervised Learning of German Qualia Relations
In the last decade, substantial progress has been made in the induction of semantic relations from raw text, especially of hypernymy and meronymy in the English language and in the classification of noun-noun relations in compounds or other contexts. We investigate the question of learning qualia-like semantic relations that cross part-of-speech boundaries for German, by first introducing a han...
متن کاملLogical and Relational Learning
I use the term logical and relational learning (LRL) to refer to the subfield of machine learning and data mining that is concerned with learning in expressive logical or relational representations. It is the union of inductive logic programming, (statistical) relational learning and multi-relational data mining and constitutes a general class of techniques and methodology for learning from str...
متن کاملProbabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relationa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011